Variable Bit Rate Encoding Using Jpeg2000
نویسنده
چکیده
A variable bit rate encoding method for JPEG2000 is presented. The method is suitable for encoding digital cinema content. All bit rate constraints described in the DCI Digital Cinema System Specification are satisfied. At the same time, average peak signal-to-noise ratio is maximized subject to these constraints. The encoder first creates high rate codestreams that satisfy the constraints. The high rate codestreams are subsequently analyzed and parsed to create final JPEG2000 codestreams at any desired average bit rate. INTRODUCTION In July 2005, Digital Cinema Initiatives published its Digital Cinema System Specification (1). In that specification, JPEG2000 was chosen for the distribution of digital cinema content. JPEG2000 is the latest international standard for image compression. JPEG2000 provides state of the art compression performance, as well as many advanced features and functionalities. Key among them is the ability to extract multiple resolution images from a single compressed codestream. This makes it possible to easily display a 2K (2048 × 1080) image or a 4K (4096 × 2160) image from a 4K compressed codestream. Figure 1 illustrates a representative JPEG2000 encoder for digital cinema distribution. First, a component transform operates independently on each pixel of the three image color components using a 3 3 matrix to obtain three new color components having most of the information concentrated in one of the new components. The particular component transform used is the one normally used to transform RGB to YCbCr. For digital cinema, the transform is applied to X’Y’Z’ input images. The transform used is solely for the purpose of compression and should not be thought of as a color space transform. No sub-sampling is allowed, i.e., the original X’Y’Z’ components and the resulting transformed components are all 4:4:4. A wavelet transform is applied independently to each new component to produce a number of transform coefficients, organized into subbands. The transform coefficients for each subband are then partitioned into 32 32 blocks referred to as codeblocks. This is illustrated in Figure 2. Each codeblock is then encoded independently by a codeblock encoder. For a given codeblock, encoding begins by quantizing the wavelet coefficients from the codeblock to obtain quantization indices. These quantization indices can be regarded as an array of signed integers. This array of signed integers can be represented using a sign array and a magnitude array. The sign array can be considered as a binary array where the value of the array at each point indicates whether the quantization index is positive or negative. The magnitude array can be divided into a series of binary arrays with each such binary array containing one bit from each quantization index. The first of these arrays corresponds to the Most Significant Bits (MSBs) of the quantized magnitudes, and the last one corresponds to the Least Significant Bits (LSBs). Each such array is referred to as a bitplane. Each bitplane of a codeblock is then coded using a bitplane coder. The bitplane coder used in JPEG2000 is a context-dependent, binary, arithmetic coder. The bitplane coder makes three passes over each bitplane of a codeblock. These passes are referred to as coding passes. Each bit in the bitplane is encoded in one of these coding passes. The resulting compressed data are referred to as compressed coding passes. The codeblock encoder also computes the amount of distortion (typically weighted mean squared error) reduction provided by each compressed coding pass together with the length of the compressed coding pass. With this information, it is possible to define the ratio of the distortion reduction over the length of the compressed coding pass as the distortion-rate slope of the compressed coding pass. The distortion-rate slope of a compressed coding pass is the amount of distortion reduction per byte provided by the compressed coding pass. Thus, a compressed coding pass with a larger distortion-rate slope can be considered to be more important than one with a smaller distortion-rate slope. The codeblock encoder provides the compressed coding passes, together with their lengths and distortion-rate slopes, to a codestream generation unit that decides which compressed coding passes from each codeblock will be included in the codestream. Generally, the codestream generation unit includes the compressed coding passes with the largest distortion-rate slopes into the codestream until the byte budget is exhausted. For more details on JPEG2000, the interested reader is referred to Taubman and Marcellin (2). As evident from (2, Section 8.2), it is sometimes desirable to disallow codeblock codestream termination between certain coding passes. Equivalently, it may be desirable to “group” two or more coding passes and treat them as essentially a single composite coding pass for the purpose of rate allocation. Such a composite coding pass has a single distortion-rate slope computed as the total distortion decrease of the group of coding passes divided by the total length of the group of coding passes. To simplify discussion, it should be assumed throughout the paper that this grouping is carried out when appropriate, and that then the term “coding pass” may refer to a composite coding pass. Image Component Transform Wavelet Transform Codeblock Encoder Codestream Generation Unit Coding Passes Distortion-Rate Slopes Lengths 0 1 0 1 0 1 0 1 0 1 0 1 0 JPEG2000 Codestream Figure 1 – Block diagram of JPEG2000 encoder The encoder operation described above defines how a single image can be encoded using JPEG2000. However, JPEG2000 can be used to encode images that make up an image sequence, e.g., a motion picture. When JPEG2000 is used to compress a sequence of images, there are only a few methods previously known for determining what rate to use for each image in the sequence. One possibility is to select a fixed rate (i.e., fixed number of bytes) to encode each image in the sequence. While this method is simple and allows easy implementation, it does not yield adequate performance in some applications. In many image sequences, the characteristics of the images in the sequence vary immensely. Since this method assigns a fixed number of bytes to each image, the resulting decompressed image sequence exhibits large variations in quality among images. In, Tzannes et al (3) adaptive selection of compression parameters is used to achieve some performance improvement when images are encoded in succession. The adaptation is performed for the current image using information gathered from only the previous images in the sequence: subsequent images are not considered when allocating rate for the current image. Furthermore, if two consecutive images in the sequence are not highly correlated (such as the case during a scene change), the adaptation falters. Another alternative to fixed rate coding was presented by Dagher et al (4). In this method, compressed images are placed in a buffer. Compressed data are pulled out of the buffer at a constant rate. New compressed images are added to the buffer when they become available. If the buffer is full when a new compressed image is to be added, the new compressed image (as well as the other images already in the buffer) is truncated so that all compressed data fit into the buffer. The resulting images have relatively low quality variation within a “sliding time window” corresponding to the length of the buffer employed. However, quality can vary widely over time-frames larger than the length of the sliding window. In Smith and Villasenor (5), images are coded to a fixed Peak-Signal-to-Noise Ratio (PSNR) using the method mentioned in the first paragraph of (2, Section 8.2.1). This method yields constant quality as defined by PSNR, but control of average rate is not possible. The resulting rate (or equivalently, the total size of the resulting digital cinema package) is unpredictable. Codeblock Subband Figure 2 – Wavelet transform coefficients organized into subbands and codeblocks None of the methods described above discuss any facility to place constraints on subsets of image data, such as individual images, individual components, etc. Such a facility is important for application to digital cinema. Indeed, the Digital Cinema Initiatives Specification (1), places a strict limit on the size of each individual image. In particular, no compressed image shall exceed 1,302,083 bytes in length. Additionally, no compressed color component from a 2K image shall exceed 1,041,666 bytes in length. The 2K portion of a 4K image must also satisfy this latter constraint. SATISFYING THE CONSTRAINTS In this section, we describe a rate control method that maximizes quality while satisfying all required constraints. The method begins by defining subsets of coding passes. A subset defined on an individual image is referred to as an image-wise subset. Image-wise subsets are useful in defining constraints on individual images. For example, in the 2K case, compressed size limits are placed on each component of an image, as well as on the entire image. Thus, the coding passes of an image would be grouped into three subsets corresponding to the three image components, together with a forth subset comprising of all coding passes in the image. For each image, the proposed rate control method selects the coding passes having largest distortion-rate slopes from each image-wise subset such that the total size of the selected coding passes from that subset satisfies any image-wise constraints for that subset. The selected coding passes, distortion-rate slopes and lengths are saved for each image. Nonselected coding passes are discarded. Once all of the images are so processed, the method selects the coding passes having the largest distortion-rate slopes from those remaining in the aggregate of all images. Such coding passes are selected so that the total size satisfies the total desired compressed size of the image sequence. The selected coding passes are used to form the final JPEG2000 codestreams. More concretely, assume a 2K image is to be encoded. Coding passes from the first (transformed) component are selected until the maximum allowable total length of 1,041,666 bytes (including all header information) is reached. The coding passes so selected are those having the largest distortion-rate slopes among those in the first component. Coding passes from the first component that are not selected are discarded. This process is repeated for the second and third (transformed) components. From the aggregate (over all three components) of all remaining coding passes, those having largest distortion-rate slopes are selected until the maximum allowable total length of 1,302,083 bytes (including all relevant header information) is reached. At this point, the coding passes with largest slopes have been chosen. Thus, the quality of the image is maximized, as allowed by the required constraints. Extension to 4K is similar, but the component constraint of 1,041,666 bytes is only applied to the 2K portion of each component. The process described above is repeated for each image in the sequence. Since no dependencies between images exist, multiple images can be processed in parallel. It is worth noting that if fixed rate encoding is desired, the process can end here. The images 1 For simplicity, we state here only the constraints for the case of 24 frames (images) per second. obtained will be encoded at 250 Mbit/s and will satisfy all constraints. If a lower (fixed) rate is desired, a suitable value can be substituted for 1,302,083 in the description above. Once the constraints on individual images are satisfied as described above, additional rate control can be performed over the entire sequence in an optimal fashion. Suppose a desired total size for the compressed sequence is known. A desired average encoding rate is easily converted into a total size. For example, 30 minutes of content at 125 Mbit/s yields a compressed size of 2.8125x10 bytes. From the aggregate of all coding passes of all images, as selected above, coding passes are further selected until the total desired size is achieved. The selection criteria is again, to select the coding passes having largest distortion-rate slopes. Non-selected coding passes are discarded. The final image codestreams are formed from the selected coding passes. The proposed approach maximizes reconstructed image quality within the relevant subsets (as feasible within the constraints). Specifically, the algorithm maximizes quality for individual images by selecting coding passes having highest distortion-rate slopes until maximum allowable sizes are reached. The algorithm further maximizes average quality for the entire sequence by subsequently selecting the coding passes having highest distortion-rate slopes until the desired size for the entire compressed sequence is reached.
منابع مشابه
Weighted Bit Rate Allocation in JPEG2000 Tile Encoding
Equal bit rate is assigned to all tiles of an image when compressed with JPEG2000 standard. This bit rate is selected without taking information contents of the tiles into account. This results into poor performance of JPEG2000 standard for the tiles that have higher complexity. We can improve performance of JPEG2000 by assigning higher bit rates to complex tiles. An entropy based weighted bit ...
متن کاملEffective hardware-oriented technique for the rate control of JPEG2000 encoding
A great deal of computation for JPEG2000 encoding is a redundancy when the compression rate is high. That is because many coded bit-streams will be truncated after the rate control of JPEG2000. In this paper, an effective scheme for JPEG2000 rate control is proposed. Through this scheme, the computation complexity for JPEG2000 entropy coding, that is, EBCOT Tier-I, can be greatly reduced almost...
متن کاملA Quantitive Semi-fragile Jpeg2000 Image Authentication System
In this article, we propose a novel integrated approach to quantitative semi-fragile authentication of JPEG2000 images under a generic framework which combines ECC and PKI infrastructures. Firstly acceptable manipulations (e.g., re-encoding) which should pass authentication are defined based on considerations of some target applications. We propose a unique concept of lowest authenticable bit r...
متن کاملA quantitative semi-fragile JPEG2000 image authentication system
In this article, we propose a novel integrated approach to quantitative semi-fragile authentication of JPEG2000 images under a generic framework which combines ECC and PKI infrastructures. Firstly acceptable manipulations (e.g., re-encoding) which should pass authentication are defined based on considerations of some target applications. We propose a unique concept of lowest authenticable bit r...
متن کاملSignificant bit-plane clustering technique for JPEG2000 image coding - Electronics Letters
Introduction: The JPEG2000 standard [1] has shown better performance than the widely used JPEG standard [2]. Nevertheless, efforts to improve JPEG2000 never stop. Long et al. [3] modified the quantisation step-size selection schemes for the uniform scalar quantisation used in JPEG2000 to improve the lossy compression. Lian et al. [4] proposed two skipping methods applied to the embedded block c...
متن کاملEfficient Vlsi Implementation of Bit Plane Coder of Jpeg2000
ABSTRACT To overcome many drawbacks in the current JPEG standard for still image compression, a new standard, JPEG2000, is under development by the International Standard Organization. Embedded bit plane coding is the heart of the JPEG2000 encoder. This encoder is more complex and has significantly higher computational requirements compared to the entropy encoding in current JPEG standard. Beca...
متن کامل